Combining Text and Heuristics for Cost-Sensitive Spam Filtering

نویسندگان

José María Gómez Hidalgo

Manual Maña López

Enrique Puertas Sanz

چکیده

Spam filtering is a text categorization task that shows especial features that make it interesting and difficult. First, the task has been performed traditionally using heuristics from the domain. Second, a cost model is required to avoid misclassification of legitimate messages. We present a comparative evaluation of several machine learning algorithms applied to spam filtering, considering the text of the messages and a set of heuristics for the task. Cost-oriented biasing and evaluation is performed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stacking Classifiers for Anti-Spam Filtering of E-Mail

We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the eff...

متن کامل

A Memory-Based Approach to Anti-Spam Filtering

This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, also known as “spam”, floods the mailboxes of users, causing frustration, wasting bandwidth and money, and exposing minors to unsuitable content. Using a recently introduced publicly availa...

متن کامل

Active Multi-Field Learning for Spam Filtering

Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a realworld spam filter, which suggests an active learning idea; and 2) Different messag...

متن کامل

A general-purpose sentence-level nonsense detector

I have constructed a sentence-level nonsense detector, with the goal of discriminating well-formed English sentences from the large volume of fragments, headlines, incoherent drivel, and meaningless snippets present in internet text. For many NLP tasks, the availability of large volumes of internet text is enormously helpful in combating the sparsity problem inherent in modeling language. Howev...

متن کامل

An evaluation of Naive Bayesian anti-spam filtering

It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Combining Text and Heuristics for Cost-Sensitive Spam Filtering

نویسندگان

چکیده

منابع مشابه

Stacking Classifiers for Anti-Spam Filtering of E-Mail

A Memory-Based Approach to Anti-Spam Filtering

Active Multi-Field Learning for Spam Filtering

A general-purpose sentence-level nonsense detector

An evaluation of Naive Bayesian anti-spam filtering

عنوان ژورنال:

اشتراک گذاری